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Abstract 

Background: The Middle East North Africa (MENA) region is under continuous threat of the re-emergence of West 
Nile virus (WNV) and Rift Valley Fever virus (RVF), two pathogens transmitted by the vector species Culex pipiens. 
Predicting areas at high risk for disease transmission requires an accurate model of vector distribution, however, 
most Cx. pipiens distribution modeling has been confined to temperate, forested habitats. Modeling species 
distributions across a heterogeneous landscape structure requires a flexible modeling method to capture variation 
in mosquito response to predictors as well as occurrence data points taken from a sufficient range of habitat types. 

Methods: We used presence-only data from Egypt and Lebanon to model the population distribution of Cx. pipiens 
across a portion of the MENA that also encompasses Jordan, Syria, and Israel. Models were created with a set of 
environmental predictors including bioclimatic data, human population density, hydrological data, and vegetation 
indices, and built using maximum entropy (Maxent) and boosted regression tree (BRT) methods. Models were 
created with and without the inclusion of human population density. 

Results: Predictions of Maxent and BRT models were strongly correlated in habitats with high probability of 
occurrence (Pearson's r = 0.774, r = 0.734), and more moderately correlated when predicting into regions that 
exceeded the range of the training data (r = 0.666,r = 0.558). All models agreed in predicting high probability of 
occupancy around major urban areas, along the banks of the Nile, the valleys of Israel, Lebanon, and Jordan, and 
southwestern Saudi Arabia. The most powerful predictors of Cx. pipiens habitat were human population density 
(60.6% Maxent models, 34.9% BRT models) and the seasonality of the enhanced vegetation index (EVI) (44.7% 
Maxent, 16.3% BRT). Maxent models tended to be dominated by a single predictor. Areas of high probability 
corresponded with sites of independent surveys or previous disease outbreaks. 

Conclusions: Cx. pipiens occurrence was positively associated with areas of high human population density and 
consistent vegetation cover, but was not significantly driven by temperature and rainfall, suggesting human-induced 
habitat change such as irrigation and urban infrastructure has a greater influence on vector distribution in this region 
than in temperate zones. 

Keywords: Species distribution models, Remote sensing, Culex pipiens, Maxent, Public health, Arid region, Irrigation 




Parasites 
^Vectors 



* Correspondence: amyconley@gmail 

department of Geography, University of Miami, 1300 Campo Sano Avenue, 
Coral Gables, FL 33146, USA 

Full list of author information is available at the end of the article 

O© 2014 Conley et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
BiolVlGCl C6ntTcll Commons Attribution License (http://creativecommons.Org/licenses/by/4.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain 
Dedication waiver (http://creativecommons.Org/publicdomain/zero/1.0/) applies to the data made available in this article, 
unless otherwise stated. 



Conley et al. Parasites & Vectors 2014, 7:289 
http://www.parasitesandvectors.eom/content/7/1/289 



Page 2 of 16 



Background 

With rapidly expanding urban populations, recent refu- 
gee movements, internal displacement, and widespread 
civil strife, parts of the Middle East are becoming in- 
creasingly vulnerable to vector-borne diseases (VBDs). 
Urban growth, in particular, has expanded faster than 
the supply of available housing and supporting infra- 
structure, resulting in a large proportion of the popula- 
tion living in unsanitary conditions, which may increase 
human exposure to bites of infected vectors that prolif- 
erate in urban environments [1-4]. 

Prior to the recent period of civil conflict, the region 
has experienced the emergence and resurgences of two 
deadly diseases. Rift Valley Fever Virus caused an epi- 
demic in Egypt in 1977, 1993 and 2003, and hit Saudi 
Arabia and Yemen in 2000 and 2001 [5-7]. In addition, 
West Nile virus, a nearly "forgotten" disease re-emerged 
in a severe country-wide epidemic in Israel in 2000, with 
a fatality rate of 8.4% [8]. Between 2010 and 2012, 200 
West Nile cases were reported in Israel [9]. While the 
causes for the recent resurgence of neglected tropical 
diseases like West Nile Virus, Rift Valley Fever Virus, 
and Leishmaniasis are not fully understood, it is cause 
for renewed interest and increased attention to disease 
dynamics in the region [2,10,11]. 

Molecular and epidemiological studies have shown 
that Cx. pipiens is a vector of both Rift Valley Fever 
virus and West Nile Virus [6,12-16]. It has a preference 
for living near areas of high human population density, 
and breeding in the artificial containers abundant in 
close proximity to human settlements [17,18] or polluted 
pools of water associated with human activities [19-22]. 
The species also feeds opportunistically from a wide var- 
iety of blood hosts [23,24], and is active nearly year 
round [21]. This vast ecological plasticity makes it a po- 
tentially significant vector. As such, understanding the 
distribution of Cx. pipiens throughout the region is a 
fundamental requirement to understanding transmission 
dynamics. 

Habitat suitability models for Cx. pipiens in general 
have identified rainfall, temperature, and vegetation to 
be major drivers of population distribution [25,26]. 
Density of hosts has also been found to be significantly 
correlated with habitat suitability in related species [27]. 
However, these studies were largely conducted in differ- 
ent climatic regions than those dominant in the MENA 
region, and therefore the identified predictors would not 
necessarily explain vector distributions in arid and semi- 
arid regions. Environmental predictors will be most in- 
formative at scales and across regions where they exhibit 
sufficient heterogeneity to accurately discern suitable 
from unsuitable habitat. The relative information values 
of predictors shift as underlying climatic conditions 
change, and so does the nature and identity of the most 



significant vector-habitat relationship in the model. In 
Egypt, climatic variables such as ambient temperature, 
relative humidity, and wind speed did not significantly 
distinguish sites with high and low filariasis transmission 
vectored by Cx. pipiens, which was positively correlated 
with temperature and negatively correlated with rainfall 
[28]. In Australia, the critical climate factors for predicting 
outbreaks of Ross River virus vary significantly between 
different environmental regions within a single state [29]. 
Simulation models of Cx. quinquefasciatus across the 
southern United States revealed significantly different sen- 
sitivities of mosquito populations to temperature and pre- 
cipitation in arid and humid habitats [30]. 

The aim of the present work is to investigate the rela- 
tionship between the distribution of Cx. pipiens and the 
environment in the Middle East and subsequently derive 
vector distribution surfaces. Accordingly, we used data 
collected from a wide range of habitats in the region and 
two different but robust presence-only species distribu- 
tion models (SDMs), Maxent and boosted regression 
trees (BRT). The models examined the role of human 
populations, climatic, hydrological, and vegetation pa- 
rameters in predicting habitat suitability. 

Methods 

Selecting modeling approaches 

In order to strengthen confidence in predicting Cx. pipiens 
distributions, we incorporated two modeling approaches, 
Maximum entropy implemented in Maxent software, and 
boosted regression trees. Regression models are powerful 
tools for selecting relevant predictors and modeling com- 
plex interactions, while boosting avoids misclassification 
problems inherent in a single tree model. Thus our com- 
bined models would enable powerful selection of relevant 
predictors and accurate modeling of complex interactions. 
The previously discussed non-uniform relationships be- 
tween other Culex species and critical climate factors [30] 
suggest such interactions may possibly contribute signifi- 
cantly to Cx. pipiens distributions as well. Thus, the use of 
model intercomparisons leads to greater confidence in the 
predictions of the distribution. Using Maxent as our alter- 
native method allows for comparisons with the predictions 
of one of the most familiar and frequently applied pres- 
ence only modeling techniques [31-34]. 

Modeling species distribution using maximum entropy 

Maximum entropy is a machine learning algorithm that 
produces predictions of habitat suitability by comparing 
the conditional density of predictors at presence sites 
with the marginal density of predictors across the study 
area. The raw output of Maxent is a probability of habi- 
tat suitability [35,36]. To make model output more inter- 
pretable, Maxent converts the exponential values of its 
raw output estimate of habitat suitability into a logistic 
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output that represents an estimate of probability of pres- 
ence [37]. 

Modeling species distribution using boosted regression 
trees 

Boosted regression trees predict the value of a target 
variable based on the value of several input variables. 
Boosting uses a machine learning algorithm to produce 
a final prediction model that is an ensemble of weak pre- 
diction models, in this case, an ensemble of regression 
trees. The model is built in a stage wise fashion; a re- 
gression tree is fit to the original data, then the residuals 
of that model become the new data values, to which a 
second tree is fit, and so forth. By fitting each subse- 
quent tree in the model to the residuals of the previous 
tree, the data become re-weighted in each iteration. 
Points that were misclassified by the previous model will 
now have more weight than values that were classified 
correctly. As a result, the subsequent tree will focus on 
fitting these misclassified points. The process is condi- 
tioned by the learning rate, which controls the contribu- 
tion of each tree to the final model. 

The final model can consist of thousands of trees, 
however, over-fit models will exaggerate minor fluctua- 
tions in the data, making them poor predictors. Boosted 
regression uses cross validation to minimize over-fitting 
by determining when adding additional trees no longer 



improves predictive performance, and selecting that optimum 
number of trees. 

Predictive error is measured as the Bernoulli residual 
deviance between the predicted values of the model and 
the observed values of the test data. All BRT models are 
fitted in R using the 'gbm' and 'dismo' libraries [38-40]. 

Target species and occurrence data 

Our study area of interest (Figure 1) focuses on an ap- 
proximately 5 million square kilometer region of the 
MENA that encompasses Egypt, Israel, Lebanon, Syria, 
Jordan as well as substantial portions of Turkey, Saudi 
Arabia, and Iraq. 

The Cx. pipiens complex are the most widely distrib- 
uted mosquito species in the world and two biotypes of 
Cx. pipiens L. are present in our study area, the anauto- 
genous and autogenous biotypes. These mosquito bio- 
types breed in overlapping niches and readily hybridize 
in areas where they coexist [41]. In Egypt, both autogen- 
ous and anautogenous Cx. pipiens individuals were en- 
countered in the progeny of autogenous or anautogenous 
female parents and more than 75% of field caught females 
produced mixed progenies [42] . 

Cx. pipiens is the most abundant mosquito in Lebanon, 
collected both indoors and outdoors. Active and abundant 
year round, it is anthropophilic, endophagic, and endo- 
phillic [21]. The species is highly behaviorally plastic, while 
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Cx. pipiens in the Al Sharqyia governorate of Egypt's Nile 
delta also displays endophagic and primarily anthropophi- 
liic behavior [24], mosquitoes in the neighboring gover- 
norate have been observed to switch from indoor to 
outdoor feeding with seasonal shifts in temperature and 
wind speed [43] . Females will feed from a diverse range of 
hosts, including horses, cows, sheep, dogs, cats, humans 
and rats [23] . 

Cx. pipiens breeds in water with high organic content 
[41]. In Lebanon breeding sites for Cx. pipiens include 
containers which range in size from small cans filled 
with water to garden pools and irrigation ditches [21]. In 
the urban habitat of Cairo, the common breeding habi- 
tats for Cx. pipiens are cesspits, drainage canals, springs, 
cesspools, and irrigation ditches [22]. 

Training data from both Egypt and Lebanon were in- 
cluded to maximize sampling of the species' environ- 
mental range, from the hot and hyperarid to cool and 
subhumid. This range of samples maximizes the combi- 
nations of environmental factors used in calibrating the 
models, increasing their predictive power [44,45]. 

Occurrence records for Cx. pipiens were obtained 
from previously unpublished vector surveys in Egypt and 
Lebanon (Figure 1). Adult samples from Lebanon were 
collected in the summer of 2009. Methods included 
CDC light traps with a dry ice attractant, and BG- 
sentinel traps with pheromone attractant. Traps were 
placed in outdoor and indoor habitats at 5 pm and col- 
lected the following morning. In addition to adult sam- 
pling, larval stages of mosquitoes were sampled in the 
summers of 2002 and 2003. Samples were collected by 
the classical dipping method using a white tray and/or a 
ladle from both artificial and natural breeding sites in- 
cluding swamps, ponds, riverbanks, irrigation tanks, and 
water storage tanks. Coordinates for sampling locations 
were obtained from Google Earth and GPS. Adult speci- 
mens were pinned and 4 th instar larval specimens were 
mounted on slides and identified morphologically using 
local and regional identification keys [46]. Occurrence 
records from Egypt (Gad, unpublished data) were col- 
lected using similar methods as part of a nation-wide 
survey. 

Environmental layers 

We considered twenty-four environmental variables as 
potential predictors of Cx. pipiens habitat (Table 1). 
Temperature and precipitation were represented by 19 
variables of the WorldClim dataset [47]. Quality of vege- 
tative cover was described by the standard deviation, 
mean, and maximum values of the enhanced vegetation 
index (EVI), as derived from Moderate Resolution Image 
Spectroradiometer (MODIS) imagery from the Terra 
Satellite for 2001. The EVI provides a measure of photo- 
synthetic activity or landscape greenness and hence 



captures vegetation features such as leaf area, canopy 
cover, sugar resources that may provide resting sites and 
alternate food sources for this vector species [48-50]. As 
available soil water limits primary productivity in the 
study region, mean and maximum EVI are generally 
correlated with annual rainfall, whereas the standard de- 
viation of EVI provides a measure of vegetation season- 
ality, which is controlled by a variety of factors including 
temperature, day length, insolation, irrigation, and biotic 
factors [51]. An additional covariate related to potential 
vector breeding habitat was derived from topographic 
data, the topographical wetness index (TWI), which de- 
scribes the tendency of water to collect in areas of topo- 
graphic minima [52]. Because information regarding the 
distribution of hosts greatly improves models of mos- 
quito distributions [53] and Cx. pipiens is adapted to 
breeding and feeding in human-altered landscapes, we 
also included human population density as a predictor 
using data from Landscan™ 2010. Environmental layers 
were gridded to a spatial resolution of approximately 
1 km (926.63 m), a scale that captured as much environ- 
mental heterogeneity as possible within the limits of 
computer processing capability. 

Data processing 

Presence Data: All point locations recorded as positive for 
Cx. pipiens from Egypt (n = 239) and Lebanon (n = 83) 
surveys were included as presence locations in all models 
(Figure 1). 

Background Data: Both boosted regression and max- 
imum entropy methods predict areas with a high prob- 
ability of presence of conditions suitable for the target 
species by comparing the values of environmental pre- 
dictors at presence locations to the values of the same 
predictors at a set of random "background" locations, 
which represent the range of available environmental 
values [54]. Because our data, like many presence-only 
data sets [55,56], showed a strong bias in sampling ef- 
fort, we selected background points with the same geo- 
graphical bias as the occurrence data (Additional file 1). 

An independent data set of 79 Cx. pipiens locations, 
23 from Israel and 56 from Egypt [57], was used to as- 
sess accuracy of model predictions (Additional file 2: 
Figure SI). These data were used in all Maxent test data 
calculations. 

Assessing the contribution of human population density 
to Cx. pipiens distribution models 

Human population density has the potential to be a 
strong predictor of Cx. pipiens habitat [53,58]. However, 
because the survey locations of the original studies were 
all in close proximity to populated areas, population 
density is also a strong predictor of sampling effort. To 
control for this we ran the entire modeling process 
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Table 1 Data sources of environmental predictors used in species distribution models 



Variable code 


Data type 


Date 




Source res 


Data source 


Biol 


Annual mean temperature 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio2 


Mean diurnal range 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio3 


Isothermality 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio4 


Temperature seasonality 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio5 


Maximum temperature of the warmest month 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio6 


Minimum temperature of the coldest month 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio7 


Temperature annual range 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio8 


Mean temperature of the wettest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio9 


Mean temperature of the driest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


BiolO 


Mean temperature of the warmest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Biol 1 


Mean temperature of the coldest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio12 


Annual precipitation 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio13 


Precipitation of the wettest month 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio14 


Precipitation of the driest month 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio15 


Precipitation seasonality 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio16 


Precipitation of the wettest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Biol 7 


Precipitation of the driest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio18 


Precipitation of the warmest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


Bio19 


Precipitation of the coldest quarter 


1960- 


990 


30 arc sec 


WorldClim 1 


EVIMAX 


Maximum EVI 


2001 




250 m 


MODIS 2 


EVIMEAN 


Average annual EVI 


2001 




250 m 


MODIS 2 


EVISD 


Standard deviation of EVI 


2001 




250 m 


MODIS 2 


Population 


Population count 


2010 




30 arc sec 


Landscan 3 


"TWI 


Topographical wetness index 


2000 




90 m 


GLSDEM 4 



WorldClim Global Climate database v1.4, available at :http://www.worldcl im.org/ {accessed 28/8/2013}. 
2 Moderate Resolution Imaging Spectrometer (MODIS), available at: https://lpdaac.usgs.gov/ {accessed 28/8/2013}. 

3 LandScan (2010)™ High Resolution global Population Data Set copyrighted by UT-Battelle, LLC, operator of Oak Ridge National Laboratory under Contract 
No. DE-AC05-00OR22725 with the United States Department of Energy. 

4 Global Land Survey Digital Elevation Model (GLSDEM), available at: glcf.umd.edu/data/glsdem/ {accessed 28/8/2013} (Input elevation data set to GLSDEM for 
study area is Shuttle Radar Topography Mission (SRTM)). 

Data sources of environmental predictors used in construction of species distribution models. All layers were gridded to 926.63 m spatial resolution and projected 
into the MODIS sinusoidal projection. 



twice; once on the complete set of environmental vari- 
ables (N = 24), and once using all environmental variables 
except population density (N = 23). Including population 
density allows us to examine the relative influence of an 
important socioeconomic environmental factor, while ex- 
cluding human populations allows us to examine the bio- 
physical environmental factors on their own. 

To increase the interpretability of the models both pre- 
dictor sets were reduced using the 'gbm.simplify function, 
which creates BRT models with every possible combin- 
ation of the initial predictors, and based on minimizing 
predictive error, ranks predictors from most to least influ- 
ential, and identifies the optimal predictor set [59]. 



bag fraction of 0.75, 10-fold cross validation, and a 
Bernoulli loss function. All other parameters were left at 
default settings. Predictions in geographic space were 
made in R using the "raster" package [60]. 

Maxent models using the reduced predictor sets were 
fit using Maxent v 3.3.3 k [61]. Spatial predictions of the 
model were "clamped", a Maxent feature which reduces 
the value of the predictions when extrapolating into 
areas of parameter space that exceeded the range of 
values present in the training data. The final model was 
the average logistic output of 10 repetitions of the mod- 
eling process. Additional specifics on program specifica- 
tions can be found in Additional file 1. 



Creating species distribution models 

BRT models using the reduced predictor sets were fitted 
using a learning rate of 0.05, a tree complexity of 5, a 



Evaluating and comparing model predictions 

Maxent models were evaluated using the area under the re- 
ceiver operating characteristic curve (AUC), the corrected 
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Aikaike information criterion (AICc), and the omission rate. 
BRT models were evaluated using AUC, point biserial cor- 
relation (COR), and the Bernoulli deviance. Because the 
two techniques use different test points for the intrinsic 
measures of model performance, the point biserial correl- 
ation was also calculated for all models at a common set of 
158 points made of the 79 test presence locations and 79 
random background points (Additional file 2: Figure SI). 
Pearson s correlation coefficient was used to measure how 
closely predictions agreed between the two methods. 
Agreement was compared in the "training region", using 
the common 158 points, as well as an "expanded region" of 
158 points drawn from the entire model extent (Additional 
file 2: Figure SI). 

Results 

Cx. pipiens habitat suitability 

The predicted maps of suitable Cx. pipiens habitat are 
presented in Figures 2, 3, 4 and 5. Overall, maximum en- 
tropy models predicted a broader distribution of suitable 
habitat than boosted regression methods, which pro- 
duced much more conservative predictions extent of 
suitable habitat and a lower probability of presence. 
However, relative suitability of habitats was consistent 
between models: areas of highest suitability identified by 
BRT were also the areas of highest suitability selected by 
maximum entropy. 

All four models agree in predicting a moderate to high 
probability of Cx. pipiens presence in several distinct 
habitats: along the banks of the Nile and throughout 
the Nile Delta, throughout the coastal plains of Israel, 
Lebanon, and Syria, as well as the valleys of east Lebanon 
and along Israels Jordanian border (Figures 2, 3, 4, and 5). 
In models built using human population data (Figures 2 
and 4), habitats with the highest probability of presence 
were found near large population centers. Both BRTPop 
and MaxentPop models also indicate moderate probability 
of presence along the banks of the Tigris and Euphrates 
rivers. Models built without population data (Figures 3 
and 5) predicted greater probability of presence in the 
semi-arid Central Anatolia region of Turkey, the Jazirah 
plain in North East Syria and Northern Iraq, and the 
marshes of southern Iraq than the population-inclusive 
models. However, much of Iraq and Eastern Turkey oc- 
cupy a region of environmental space that falls outside the 
range of the training data (Additional file 2: Figures S3 
and S4), so predictions in these areas should be inter- 
preted cautiously. 

Model accuracy 

Evaluation metrics for Maxent and BRT models are pre- 
sented in Tables 2 and 3. Among Maxent models, both 
sets of predictors performed well, with AUC values sig- 
nificantly greater than the null model. The test AUC 



values for both predictor sets were smaller than the 
training AUC values, which indicates slight over- fitting 
of the models. This difference was smaller for the model 
created without population data. The "NoPop" model 
also had a higher test AUC value, greater entropy, and a 
smaller omission rate than the model that included 
population. The population-inclusive model had a lower 
AICc than the "NoPop" model. 

The boosted regression models also performed well, and 
also displayed a degree of over-fitting similar to the Max- 
ent models. Among the BRT models, population-inclusive 
models performed slightly better than the "NoPop" 
models on test data- with higher cross-validated AUC 
values, higher cross-validated COR values, and lower cross 
validated deviance scores. 

When evaluated over a standard set of points, the Maxent 
"NoPop" model had the strongest correlation (COR = 0.534) 
between model predictions and presence/pseudo-absence, 
the BRT NoPop and Maxent population-inclusive model 
performed equally well, and the BRT population-inclusive 
model had the weakest correlation (COR = 0.413) of the four 
models evaluated. 

Model agreement 

When evaluated over the training region, the predictions 
of Maxent and BRT models were very strongly correlated, 
with models including population data more closely corre- 
lated than models which excluded it (Figure 6). When 
evaluated over the entire modeled region, the correlation 
between models was weaker. 

Contribution of environmental predictors to Cx. pipiens 
distribution models 

Parameter reduction of the full predictor set (n = 24) re- 
sulted in a simplified set of 9 significant predictors. 
(Table 4). Parameter reduction of the full predictor set 
excluding human population density (n = 23) resulted in 
a simplified set of 14 significant predictors (Table 5). 

Both modeling techniques selected similar variables as 
the most important environmental factors driving spe- 
cies distribution. Population was the largest single con- 
tributor to model predictions, ranked as most influential 
under both BRT (34.9%) and Maxent (60.6%). 

In models created without human population density, 
the variable with the single largest contribution was the 
standard deviation of the enhanced vegetation index 
(EVI). Mean EVI was also highly ranked in all models 
and all data sets. 

Discussion 

Model agreement and predictions 

Our models showed strong agreement between boosted 
regression and maximum entropy methods in selection 
of habitat with the highest probability of occurrence of 
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Figure 3 Maxent No Pop. Probability of Culex pipiens presence based on a species distribution model generated using maximum 

entropy and a set of environmental predictors (N = 14) that do not include human population density. Habitat suitability has been 

converted to probability of presence, assuming a prevalence of 0.5. Bottom: Enlarged to show population centers, 
k J 
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Figure 4 BRT Pop. Probability of Culex pipiens presence based on a species distribution model generated using boosted regression 
trees and a set of environmental predictors (N = 9) that includes include human population density. Bottom: Enlarged to show 
population centers. 
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Table 2 Evaluation parameters for species distribution models built using human population data 



Model 


Parameters 


Observed 


Observed 


Null model 


Null 


Prevalence 


Entropy 


AlCc 


Omission rate 3 


Test COR 


Test 




in final 


training 


test AUC 


training 


model 








N = 79 


(N = 158) 


deviance 




model 


AUC 




AUC 


test AUC 












(N = 158) 


Maxent 


9 


0.938 


0.872 


0.827 ±0.01 


0.450 +/- 


0.151 


8.043 


8383 


0.206 


0.470 


197.9 


Pop 1 










0.03 














Maxent 


14 


0.916 


0.879 


0.848 + 0.01 


0.476 +/- 


0.185 


8.252 


8916 


0.194 


0.534 


178.3 


NoPop 2 










0.04 















^ata sources used for model development: Bioclim + Vegetation + Population (N = 24). 
2 Data sources used for model development: Bioclim + Vegetation (N = 23). 

3 Using equal test sensitivity and specificity threshold. Maxent Pop threshold =0.22, Maxent No Pop threshold =0.34. 

Evaluation parameters for Culex pipiens distribution models generated using a set of environmental predictors including human population density ("Maxent Pop") 
and a set excluding human population density ("Maxent No Pop"). Models were built using 322 presence points. "Null Model" parameters represent the average 
value of models built from one hundred 322-point dummy data sets. "Test AUC" and "Omission Rate" are calculated from an independent data set of 79 presence 
points, not used in training the models. "Test COR" and "Test Deviance" are calculated from a data set that includes the 79 test data points and 79 background 
points. "Test COR" = Pearson correlation coefficient. 



Cx. pipiens. High AUC values for all four models indi- 
cated that occupancy sites were highly likely to be 
assigned a higher probability of presence than back- 
ground sites regardless of method. The strong correl- 
ation between BRT and Maxent outputs in areas with a 
high proportion of known Cx. pipiens habitat indicates 
that the models strongly agree on areas of potentially 
high risk. These areas included populated areas within 
the Nile delta, the valleys of Israel, Lebanon, and Jordan, 
and southwestern Saudi Arabia. 

Our model predictions are supported by agreement 
between predicted areas of highest probability of Cx. 
pipiens occurrence, and recent positive vector surveys, 
or outbreaks of vector-associated diseases. The regions 
of central Israel, predicted by our models to be areas of 
high probability of Cx. pipiens presence, corresponded 
to the areas of highest incidence of West Nile virus dur- 
ing the 2000 outbreak [8]. Likewise, the region of highest 
probability of occupancy in Saudi Arabia corresponds 
with recent collections of Cx. pipiens [62-64] and the in- 
cidence of infection during the 2000 Rift Valley Fever 
virus outbreak [65]. The distribution of suitable Cx. 
pipiens habitat in Saudi Arabia is of special interest be- 
cause of the high potential for repeated import of dis- 
eases into the country via large numbers of pilgrims 
travelling through for the annual Hajj [66]. The city of 
Jeddah has twice been hit by epidemics of dengue dir- 
ectly following the Hajj [67]. Our results indicate that 
this same area is a suitable habitat for supporting Cx. 



pipiens, and potentially capable of transmitting West 
Nile virus, should it be re-introduced. 

Agreement between model predictions was less corre- 
lated when evaluated over the full region, including areas 
where models were required to extrapolate. This is a 
result of the different behavior of each algorithm in 
extreme environments [68]. Maxent, when clamped, 
extrapolates in a horizontal line from the fit at the most 
extreme value in the training data. BRT, which does not 
use clamping, extrapolates at a constant value from the 
last known site. This difference in extrapolation can also 
be seen in the spatial predictions of the population- 
exclusive models when predicting into the most dissimi- 
lar areas of the modeling region. BRT assigns a higher 
relative probability of occupancy to habitat in northern 
Iraq and southern Syria, classifying the area with a prob- 
ability of occupancy similar to that of the river valley 
along the Israel-Jordan border. The Maxent model, 
which constrains its predictions more severely, predicts 
a much more moderate probability of occupancy. The 
same relative over-estimation of the BRT algorithm is not 
as apparent in the population-inclusive model, because 
the environmental predictor that most exceeds its training 
values in this region is temperature seasonality, which was 
not as influential a predictor in the population-inclusive 
models as it was in the population-exclusive models. The 
effect of clamping can still be seen in the population- 
inclusive Maxent model. An advantage of using two differ- 
ent modeling methods is our ability to detect regions of 



Table 3 Evaluation parameters for species distribution models excluding human population data 



Model 


Parameters in 


#trees 


Training AUC 


CV AUC 


Training COR 


CVCOR 


CV deviance 


Test COR 


Test deviance 




final model 














(N = 158) 


(N = 158) 


BRT Pop 1 


9 


1550 


0.945 


0.889 


0.663 


0.431 


0.198 


0.413 


453.3 


BRT No Pop 2 


14 


3050 


0.982 


0.864 


0.745 


0.361 


0.212 


0.470 


395.8 



^ata sources used for model development: Bioclim + Vegetation + Population (N = 24). 
2 Data sources used for model development: Bioclim + Vegetation (N = 23). 

Evaluation parameters for Culex pipiens distribution models generated using boosted regression trees from a set of environmental predictors including human 
population density ("BRT Pop") and a set excluding human population density ("BRT No Pop"). Models were built using 322 presence points, and evaluated using 
10 fold cross validation. "CV AUC" and "CV Deviance" are the area under the curve (AUC) and Bernoulli deviance evaluated at the withheld dataset. "Test COR" 
and "Test Deviance" are calculated from the predicted model values at 79 independent occurrence points not used in training the mode, and 79 background 
points from the same region. 'Test COR" = Pearson correlation coefficient. 
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Figure 6 Correlation between predictions of Culex pipiens distribution models created using boosted regression trees (BRT) and 
maximum entropy (Maxent) algorithms. 'Training region" consists of 158 points taken from within a similar geographical area to the samples 
used building the model, 79 background points and 78 independent occurrence points. "Expanded Region" measures model agreement 
throughout the entire modeled extent, and compares values at 158 random points taken with equal probability from the entire model extent. 
Left panels describe models built using human population density as a parameter (N = 9). Panels on the right describe models excluding human 
population density (N = 14). COR = Pearson correlation coefficient. 



parameter space where choice of the underlying modeling 
algorithm has the greatest influence on strength of predic- 
tions. Results in these areas need to be interpreted more 
cautiously than areas where both models are in agreement. 

Presence-only modeling methods, such as those used 
in this study, make the assumption that a target species 
is equally detectable across all habitats [35]. However, if 
sampling detection probability varies with one or more 
environmental predictors, our model will not distinguish 



between habitat with a higher probability of occupancy 
and a habitat with greater detectability. Absolute values 
of the predictions should be interpreted with caution 
with this limitation in mind. Predictions should also be 
evaluated with the consideration that presence-only 
models treat densely and sparsely occupied sites the 
same, as the input data are binary. 

In order to efficiently predict species distributions 
across the study area, we were required to coarsen the 



Table 4 Contribution of environmental parameters to SDMs including human population data 


Parameter 


BRT variable importance 


Maxent variable importance 


Maxent permutation importance 


BRT rank 


Maxent rank 


Population 


34.9 


60.6 


52.8 


1 


1 


EVIMEAN 


11.1 


7.5 


0.8 


2 


3 


Bio8 


9.1 


6.1 


0.5 


3 


5 


EVIMAX 


8.7 


3.5 


1.0 


4 


6 


Bio12 


8.4 


6.2 


3.4 


5 


4 


EVISD 


8.4 


11.9 


39.1 


6 


2 


Bio4 


7.1 


1.2 


1.3 


7 


8 


Bio6 


6.7 


2.4 


0.0 


8 


7 


Bio7 


5.6 


0.7 


1.0 


9 


9 



Contribution of environmental parameters (Table 1) to species distribution models for Culex pipiens generated using boosted regression trees (BRT) and maximum 
entropy (Maxent) methods. 
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Table 5 Contribution of environmental parameters to SDMs excluding human population data 


Parameter 


BRT variable importance 


Maxent variable importance 


Maxent permutation importance 


BRT rank 


Maxent rank 


EVISD 


16.3 


44.7 


73.6 


1 


1 


EVIMEAN 


14.7 


12.6 


3.3 


2 


2 


EVIMAX 


9.7 


7.9 


11.3 


3 


4 


TWI* 


9.4 


2.9 


1.9 


4 


9 


Bio4 


8.2 


1.8 


5.5 


5 


10 


Bio 10* 


5.7 


3.1 


0.1 


6 


8 


Biol 5* 


5.0 


3.3 


1.8 


7 


7 


Bio12 


5.0 


4.6 


0.1 


8 


5 


Bio8 


4.9 


11.7 


0.2 


9 


3 


Bio7 


4.7 


0.7 


0.1 


10 


13 


Bio2* 


4.3 


0.4 


0.5 


11 


14 


Bio6 


4.3 


3.6 


0.5 


12 


6 


Bio5* 


4.1 


1.6 


0.0 


13 


11 


Bio9* 


3.7 


1.0 


1.0 


14 


12 



*Environ mental features not utilized in population-inclusive models (Table 4). 

Contribution of environmental parameters (Table 1) to species distribution models for Culex pipiens generated using boosted regression trees ("BRT") and 
maximum entropy ("Maxent") methods. 



resolution of environmental predictors to 1 km. It is im- 
portant to keep in mind that this may be a larger than 
optimal scale for examining the relationship between 
vectors and certain predictors with very fine scale het- 
erogeneity [25,69]. 

Relationship between Cx. pipiens and environmental 
predictors 

Our study is unique in its aim of identifying the under- 
lying environmental predictors driving Cx. pipiens distri- 
bution across a climate gradient that encompasses hyper 
arid to sub humid habitats. The novel aspects of our 
findings are the importance of human population dens- 
ity and seasonality of vegetation indices as powerful pre- 
dictors of Cx. pipiens occurrence. 

Human population density was identified by both 
modeling methods as the highest contributing predictor 
to habitat suitability. Interpreting this result is complex, 
because areas of dense human habitation may be favor- 
able to mosquitoes for several reasons. Humans are a 
host species, their dwellings also provide a sheltered 
resting area, and in rural areas human activities such as 
irrigation and agriculture may also provide favorable 
breeding habitat and non-human hosts. Breeding condi- 
tions for mosquitoes are also favorable in slums, where 
high human population density is combined with poor in- 
frastructure and inadequate services [70] . Characterization 
of mosquito breeding habitat in urban Cairo found that 
93.5% of breeding sites were found in slum areas, charac- 
terized by incomplete construction, disorganized roads, 
and crowded, dense conditions [71]. Socioeconomic 
conditions were also significantly related to Cx. pipiens 



breeding sites in Washington DC, although in the north- 
ern temperate region the relationship was reversed, breed- 
ing sites were positively associated with presence of 
functional containers, like flower pots and garbage pails, 
which were found primarily in areas of higher socioeco- 
nomic status [27] . 

Our results indicate that quality and seasonality of 
vegetation is a powerful predictor of Cx. pipiens in arid 
and semi-arid habitats. The relationship between Cx. 
pipiens occurrence and vegetation quality was strongly 
positively associated with the maximum value of the en- 
hanced vegetation index (EVI), and negatively associated 
with the mean and standard deviation (seasonality) of 
the EVI. This suggests that in our study region, high 
quality Cx. pipiens habitat consists of areas of high pri- 
mary productivity, which maintain that quality of habitat 
with very little change throughout the year. In rural 
areas of Egypt, year round irrigation maintains precisely 
this kind of habitat. Vegetation indices are not always 
the most informative predictors in the region, in a study 
examining another Cx. pipiens vectored infection, filaria- 
sis, the Normalized Difference Vegetation Index did not 
distinguish between Egyptian villages at high and low 
risk of infection [72]. The positive relationship between 
Cx. pipiens and stable areas of high primary productivity 
suggested by our results is the opposite of the relation- 
ship seen in the forested northeastern United States, 
where the most accurate model of Cx. pipiens abun- 
dance was negatively correlated with forest cover and 
did not include the vegetative index at all [25]. 

Topographic wetness index has been a significant pre- 
dictor of anopheline abundance and malaria risk among 
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households in Thailand [73] and Kenya [74,75]. Despite 
this, in our models it was a significant predictor only in 
the BRT "NoPop" model It is possible that in arid cli- 
mates with low humidity and little rainfall, the potential 
for pooling water, as predicted by TWI, does not trans- 
late into pools of water with enough frequency to make 
TWI a stronger indicator of vector breeding habitat. It is 
also possible that, even if TWI does accurately predict 
the distribution of pools of clear groundwater in the re- 
gion, those pools of clear groundwater may not be the 
breeding habitat most favored by Cx. pipiens. In habitats 
where mosquitos chose to oviposit more frequently in 
artificial or natural containers, rather than ground pools, 
this will not be reflected in the TWI. It is also notable 
that none of our models found rainfall to be a very 
powerful predictor. In this context, the vegetation indi- 
ces may provide better information on available soil 
moisture than hydrology or rainfall parameters. 

Conclusions 

Our study provides insight into the drivers of Cx. pipiens 
distribution in an understudied region at growing risk 
from the re-emergence of several arboviruses. The most 
dominant predictors in our species distribution model, 
human population density and the seasonality of the en- 
hanced vegetation index, are also both environmental 
factors with potential correlations with urbanization and 
agricultural practices. By understanding the relationship 
between these predictors and disease vector distribu- 
tions, we gain a better understanding of how changes in 
land use or shifting patterns of human settlement might 
influence disease transmission. Given the resurgence of 
vector borne diseases like West Nile and Rift Valley 
Fever [10] understanding how choices may influence dis- 
ease ecology through agriculture, urbanization, and 
population growth, is a topic of vital importance. 
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